Question & Answer

Computer science

\ufeff

2 .

\ufeffSelect two variables that may have association with Hb

.

\ufeff Hint: This is a real data and relationships may appear different from standard textbook patterns.

.

\ufeffJustify your choice using dplyr

()

or baseR tool

(

) .

\ufeffShow your working in R

-

\ufeffcode and any output you produce in support of your argument. Marks

(4)

.

\ufeffUse your investigation in QC

2 .

.)

\ufeffto replace all missing values in the variable Hb

.

\ufeffShow your code. Please do not use any automatic imputation package or routines such as Hmisc. Marks

(6)

.

\ufeffPresent a comparison

-

\ufeffwith discussion

-

\ufeffbetween the two types of imputation in the context of variable Hb

.

\ufeffHint: You may use appropriate proximity and

/

or visualisation tools learned in topics of Weeks

3 - 7,

\ufefffor comparison.

Below is excel file data bowel.csv

(

file name

)

ID Age

_

admission Sex Disease

_

duration FC CRP Alb Hb Colectomy IFX

_

pre IFX

_

admission

4033901 72 1 0.25 32 26 95 1 1 1

1340884 32 1 1 1400 19 26 102 1 0 0

4054914 28 1 1.5 86 27 67 1 0 0

1275417 54 1 5 2660 6.7 37 132 1 0 0

2004523 28 1 1 7230 274 33 142 1 0 0

2145612 58 2 8 27 24 121 1 0 0

650942 59 1 33 39 23 70 2 0 0

964463 56 2 6 54 33 1 1 0

8227811 29 2 0 678 57 33 77 1 0 0

4025493 60 2 20 2310 107 28 90 1 0 0

4134290 38 1 1.5 345 26 33 1 0 0

9018394 27 1 0 1370 234 32 133 1 0 0

8148946 40 1 0.5 1620 88 32 107 1 0 0

4155570 41 1 704 228 22 100 1 0 0

2079662 31 1 11 280 22 135 1 0 0

2273159 31 1 1 17 21 55 1 0 0

2029820 45 2 6 96 30 1 0 0

5000920 18 1 0.5 92 41 154 1 0 0

2284199 42 1 2 22 32 1 1 1

2287574 27 1 1 100 31 139 1 0 0

2257547 28 2 10 3000 83 28 116 1 0 0

2264655 56 1 3.5 1.7 36 136 1 1 1

4005015 23 1 3 6 32 141 1 0 0

4011970 27 2 4 3000 8.9 27 118 1 0 0

2272829 28 2 4 219 30 1 0 0

4021463 27 2 0 3000 100 29 90 1 0 0

4021733 20 1 5 155 42 140 1 0 0

1357276 26 2 13 153 32 137 1 0 0

2278456 41 2 4 104 27 149 1 0 0

811148 63 2 8 92 29 100 1 0 0

5008426 26 2 0 187 24 132 1 0 0

4107410 23 2 3 3210 46 36 147 1 0 0

978658 51 2 7 1504 31 33 142 1 0 0

5012529 20 2 0 4090 37 24 136 1 0 0

4033348 51 2 24 3000 7.1 31 120 1 0 0

1380102 55 2 8 212 31 31 136 1 0 0

4068450 18 2 1 248 27 120 1 0 0

4031825 27 2 1 1 33 1 1 1

2054914 29 1 0.25 666 37.7 31 100 1 1 1

4069818 34 1 0 12 34 136 1 0 0

4073025 23 1 0 55 22 114 1 0 0

4075666 54 2 2 46 24 75 2 0 0

1080538 68 1 10 111 28 116 1 0 1

4038486 25 2 3 5540 1.5 37 116 1 0 0

5000920 20 1 3 32 35 97 1 0 0

2089553 34 2 5 153 12.2 35 1 1 1

1287731 28 1 14 759 20.5 29 123 1 0 0

961558 45 1 1 10100 39.8 38 122 1 0 0

4085609 49 1 6 2410 43 27 148 1 0 0

8176784 36 1 0 49 30 125 1 0 0

4011970 28 2 11 972 8 37 136 1 0 0

2030675 51 1 2 7240 67 17 89 1 0 0

4102608 29 1 0.2 1070 50 31 140 1 0 0

arison. Marks

(5) .

================================================================== # Function to install and load packages install_and_load <- function(package) {if (!require(package, character.only = TRUE)) {install.packages(package, dependencies = TRUE) library(package, character.only = TRUE)}} # Install and load necessary packages install_and_load("dplyr") install_and_load("ggplot2") # Load the data data <- read.csv("path/to/bowel.csv") # View the structure of the data print(str(data)) # Summary of the data print(summary(data)) # a. Justify your choice using correlation # Correlation matrix cor_matrix <- cor(data[, sapply(data, is.numeric)], use = "complete.obs") # Print the correlation matrix print(cor_matrix) # Select variables with high correlation with Hb selected_vars <- cor_matrix["Hb", ] selected_vars <- sort(selected_vars, decreasing = TRUE) selected_vars <- selected_vars[-which(names(selected_vars) == "Hb")] # Top 2 variables with highest correlation with Hb top_2_vars <- names(selected_vars)[1:2] print(top_2_vars) # Plotting to visualize relationships ggplot(data, aes(x = CRP, y = Hb)) + geom_point() + geom_smooth(method = "lm") + ggtitle("CRP vs Hb") ggplot(data, aes(x = Alb, y = Hb)) + geom_point() + geom_smooth(method = "lm") + ggtitle("Albumin vs Hb") # b. Replace missing values in Hb using a regression model # Check for missing values in Hb missing_count <- sum(is.na(data$Hb)) print(paste("Number of missing values in Hb:", missing_count)) # Fit a linear model model <- lm(Hb ~ CRP + Alb, data = data, na.action = na.exclude) # Predict Hb for missing values predicted_values <- predict(model, newdata = data[is.na(data$Hb), ]) # Replace missing values in Hb data$Hb[is.na(data$Hb)] <- predicted_values # Verify if there are any missing values left missing_count_after <- sum(is.na(data$Hb)) print(paste("Number of missing values in Hb after imputation:", missing_count_after)) # c. Comparison of imputation methods # Mean imputation data_mean_imputed <- data data_mean_imputed$Hb[is.na(data_mean_imputed$Hb)] <- mean(data$Hb, na.rm = TRUE) # Visual comparison ggplot(data, aes(x = CRP, y = Hb)) + geom_point(color = "blue") + geom_point(data = data_mean_imputed, aes(x = CRP, y = Hb), color = "red", shape = 1) + ggtitle("Comparison of Imputation Methods") # Statistical comparison original_vs_mean <- data.frame(Original = data$Hb, Mean_Imputed = data_mean_imputed$Hb) original_vs_regression <- data.frame(Original = data$Hb, Regression_Imputed = data$Hb) # Calculate mean differences mean_diff_mean <- mean(original_vs_mean$Original - original_vs_mean$Mean_Imputed, na.rm = TRUE) mean_diff_regression <- mean(original_vs_regression$Original - original_vs_regression$Regression_Imputed, na.rm = TRUE) # Print mean differences print(paste("Mean difference (Original vs Mean Imputation):", mean_diff_mean)) print(paste("Mean difference (Original vs Regression Imputation):", mean_diff_regression)) =============================================================================== Run the above code and attach all required screen shot outputs . Thanks:)Please add output ASAP.

Answer

Likes: --% Dislikes: --%

Ratings will appear here after user interaction.

YOU HAVE TO REGISTER FOR ANSWER

Computer science

Answer

Extracted Transcript from Images